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Stochastic blockmodels are generative network models where the vertices are separated into dis- 
crete groups, and the probabihty of an edge existing between two vertices is determined solely by 
their group membership. In this paper, we derive expressions for the entropy of stochastic block- 
model ensembles. We consider several ensemble variants, including the traditional model as well as 
the newly introduced degree-corrected version [Karrer et al. Phys. Rev. E 83, 016107 (2011)], which 
imposes a degree sequence on the vertices, in addition to the block structure. The imposed degree 
sequence is implemented both as "soft" constraints, where only the expected degrees are imposed, 
and as "hard" constraints, where they are required to be the same on all samples of the ensemble. 
We also consider generalizations to multigraphs and directed graphs. We illustrate one of many 
applications of this measure by directly deriving a log-likelihood function from the entropy expres- 
sion, and using it to infer latent block structure in observed data. Due to the general nature of the 
ensembles considered, the method works well for ensembles with intrinsic degree correlations (i.e. 
with entropic origin) as well as extrinsic degree correlations, which go beyond the block structure. 



I. INTRODUCTION 

Stochastic blockmodels are random graph ensem- 
bles, in which vertices are separated into discrete groups 
(or "blocks") , and the probability of an edge existing be- 
tween two vertices is determined according to their group 
membership. This class of model (together with many 
variants which incorporate several other details 13 |6]) 
has been used extensively in the social sciences, where 
the blocks usually represents the roles played by dif- 
ferent social agents. In this context, it has been used 
mainly tool to infer latent structure in empirical 

data. More recently, it has been applied as an alterna- 
tive to the more specific task of community detection [7], 
which focus solely on densely connected communities of 
vertices [8-15^. In addition to its usefulness in this con- 
text, stochastic blockmodels serve as a general frame- 
work which has many potential applications, such as the 
parametrization of network topologies on which dynam- 
ical processes can occur [IBJ [T7], and in the modelling 
of adaptive networks, where the topology itself can vary 
according to dynamical rules [15] . 

The standard stochastic blockmodel formulation [1_ as- 
sumes that all vertices belonging to the same block are 
statistically indistinguishable, which means that they all 
have the same expected degree. This restriction is not 
very attractive for a general model, since many observed 
networks show an extreme variation of degrees, even be- 
tween vertices perceived to be of the same block (or 
"community"). Recently, this class of model has been 
augmented by the introduction of the "degree-corrected" 
variant |12| . which incorporates such degree variation, 
and was shown to be a much better model for many em- 
pirical networks. With this modification, the stochastic 
blockmodel becomes more appealing, since (except for 
the degrees) it only discards local scale properties of the 
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network topology (such as clustering, motifs, etc. |19|). 
but can represent well arbitrary global or mesoscale prop- 
erties, such as assortativity/dissortativity jSO], commu- 
nity structure jT] I21| . bipartite and multipartite adja- 
cency, and many others. 

In this work, we focus on the microcanonical en- 
tropy |22fE5] of stochastic blockmodel ensembles, defined 
as 5 = Inil, where J7 is the number of graphs in the 
ensemble. This quantity has the traditional interpreta- 
tion of measuring the degree of "order" of a given en- 
semble, which is more disordered (i.e. random) if the 
entropy is larger. It is also a thermodynamic potential, 
which, in conjunction with other appropriate quantities 
such as energy — representing different sorts of interac- 
tions, such as homophily in social systems or robust- 
ness in biological dynamical models [T5] — can be used 
to describe the equilibrium properties of evolved network 
systems [Igll^fH??]. 

From the entropy S one can directly derive the log- 
likelihood function C — InV, where V is the probability 
of observing a given network realization, which is used 
often in the blockmodel literature. Assuming that each 
graph in the ensemble is realized with the same prob- 
ability, P = we have simply that £ = —S. The 
log-likelihood can be used to infer the most likely block 
structure which matches a given network data, and thus 
plays a central role in the context of blockmodel detec- 
tion. However, the expressions for the log-likelihood C, 
as they are often derived in the stochastic blockmodel 
literature, do not allow one to directly obtain the en- 
tropy, either because the they are expressed in non-closed 
form tUSl [T3j [HI [38] , or because they only contain terms 
which depend on a posteriori partition of a sample net- 
work, with the remaining terms neglected |10l [T^ [I?] . 

In this work, we derive expressions for the entropy of 
elementary variations of the blockmodel ensembles. The 
choice of microcanonical ensembles permits the use of 
straightforward combinatorics, which simplify the anal- 
ysis. We consider both the traditional and degree- 
corrected variants of the model, as well as their imple- 
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mentations as ensembles of multigraphs (with parallel 
edges and self-loops allowed) and simple graphs (no par- 
allel edges or self-loops allowed). The degree-corrected 
variants considered here represent a generalization of the 
original definition [12 , since arbitrary nearest-neighbours 
degree correlations are also allowed. For the degree- 
corrected variants, we consider the imposed degree se- 
quence on the vertices both as "soft" and "hard" con- 
straints: When the degree constraints are "soft", it is as- 
sumed that the imposed degree on each vertex is only 
an average over the ensemble, and their values over sam- 
pled realizations are allowed to fluctuate. With "hard" 
constraints, on the other hand, it is imposed that the de- 
gree sequence is always the same on all samples of the 
ensemble. We also consider the directed versions of all 
ensembles. These represent further refinements of the 
original definition [T^ , which considered only undirected 
graphs with "soft" degree constraints. 

The entropy expressions derived represent generaliza- 
tions of several expressions found in the literature for the 
case without block structure [241 ISSHU] , which are easily 
recovered by setting the number of blocks to one. 

As a direct application of the derived entropy func- 
tions, we use them to define a log-likelihood function £, 
which can be used to detect the most likely blockmodel 
partition which fits a given network data. We show that 
these estimators work very well to detect block struc- 
tures in networks where there are intrinsic (as in the case 
of simple graphs with broad degree distributions) or ex- 
trinsic degree correlations. In particular, the expressions 
derived in this work perform better for networks with 
broad degree distributions than the sparse approxima- 
tion derived in [12 , which may result in suboptimal par- 
titions. 

This paper is divided as follows. In Sec. |IT] we de- 
fine the traditional and degree-corrected stochastic block- 
model ensembles. In Sees. |III| to[V|we systematically de- 
rive analytical expressions for the most fundamental en- 
semble variants, including simple graphs (Sec. |III[ ) and 
multigraphs (Sec. IV), both the traditional and (soft) 



degree-corrected versions, as well as the undirected and 
directed cases. In Sec. |V] we obtain the entropy for the 
degree-corrected ensembles with hard degree constraints, 
for the same variants described in the other sections. In 
Sec. I VI I we apply the derived entropy expression for the 
soft degree-corrected ensemble to the problem of block- 
model detection, by using it as a log-likelihood function. 
[Readers more interested in the application to block- 
model detection can read Secs.[II|to |IIIB| an d then move 




directly to Sec. VI 
elusion 



We finalize in Sec. VII with a con 



FIG. 1. (Color online) Example of a traditional stochastic 
blockmodel with six blocks of equal size, and matrix Crs given 
on the left (each square is a matrix element, and its size cor- 
responds to its magnitude). On the right is a sample of this 
ensemble with 10^ vertices. 



blocks, and is number of vertices in block r € [0, B—l]. 
The matrix Crs specifies the number of edges between 
blocks r and s, which are randomly placed. As matter 
of convenience, the diagonal elements e^r are defined as 
twice the number of edges internal to the block r (or 
equivalently, the number of "half-edges") . An example of 
a specific choice of parameters can be seen in Fig. [T] 

This is a "microcanonical" formulation of the usual 
"canonical" form which specifies instead the probability 
Wrs of an edge occurring between two vertices belonging 
to blocks r and s, so that the expected number of edges 
e^s = Ewj-s is allowed to fiuctuate, where E is the total 
number of edges. If the nonzero values of e^s are suf- 
ficiently large, these two ensembles become equivalent, 
since in this case fiuctuations around the mean value can 
be neglected. 

The degree-corrected variant [12' further imposes a de- 
gree sequence {ki} on each vertex i £ [0, — 1[ of the 
network, which must be obeyed in addition to the block 
structure specified by and Crs- This restriction may 
be imposed in two different ways. The first approach 
assumes these constraints are "soft", and each individual 
degree ki represents only the average value of the degree 
of vertex i over all samples of the ensemble |42lH5] (this is 
the original ensemble defined in fT^). Here, we will also 
consider a second approach which assumes the degree 
constraints are "hard", and the imposed degree sequence 
must be exactly the same in all samples of the ensemble. 
We will obtain the entropy for both these ensembles in 
the following. 



III. SIMPLE GRAPH ENSEMBLES 



II. TRADITIONAL AND 
DEGREE-CORRECTED BLOCKMODELS 

The traditional blockmodel ensemble is parametrized 
as follows: There are N vertices, partitioned into B 



A. Standard stochastic blockmodel 

In simple graphs there can be at most only one edge 
between two vertices. Therefore, we can enumerate the 
total number of different edge choices between blocks r 



3 



and s as, 

which leads to the total number of graphs, 

n=ll ilrs. (2) 

The entropy is obtained by Sg — Infi. Considering the 
values of large enough so that Stirling's approximation 
can be used, expressed as In (^) = NH(m/N), where 
H{x) is the binary entropy function, 

iJ(a;) = -xlnx - (1 - 2;)ln(l - x) (3) 

we obtain the compact expression, 

2 ^ \nrns ) 

Eq. [sjhas been derived by other means in [lOj (expressed 
as a log-likelihood function), for the canonical variant of 
the ensemble. Making use of the series expansion given 
by Eq. |4j the entropy can be written alternatively as 




where E — '^j.g&rsl'^ is the total number of edges in 
the network. The terms in the last sum in the previous 
expression are of the order O {e^ ^ / 12^71 g). This number 
is typically of the order ~ (fc)^, where (fc) is the aver- 
age degree of the network. Since the other terms of the 
expression are of order ^ {k)N, and one often has that 
{k) <^ N, the last term can be dropped, which leads to, 

5,ai,-lX:e„,„(^)^ (7) 

The last term of Eq. |7] is compatible with the equivalent 
expression for the log-likelihood derived in jT2] . We note 
that while this limit can be assumed in many practical 
scenarios, one can also easily imagine ensembles which 
are "globally sparse" (i.e. (fc) <C N), but "locally dense", 
with (k)^ = er/rir ~ n^, for any two blocks r, s (with 
Cr — being the total number of half-edges adja- 

cent to block r) . In such scenarios Eq. |7] will neglect 
potentially important contributions to the entropy, and 
therefore Eqs. [s] or |6] should be used instead. 

As shown in jT2] , the second term of Eq. [t] can be 
slightly rewritten as the KuUback-Leibler divergence [IJ 



between the actual and expected distributions of block 
assignments at the opposing ends of randomly chosen 
edges, where the expected distribution takes into account 
only the size of each block. This can be interpreted as 
the amount of additional information required to encode 
a given block partition, if one assumes a priori that the 
amount of edges incident to each block is proportional to 
its size. 



1. Directed graphs 

The ensemble of directed blockmodels can be analysed 
in an analogous fashion. The only differences is that for 
the directed version, the matrix e^s can be asymmetric, 
and one needs to differentiate between the number of 
edges leaving block r, e+ = J2s ^rs, and the number of 
edges arriving, e~ = Csr- The number of edge choices 
ilrs is given exactly as in Eq. [T] the only difference being 
that one no longer needs to differentiate the diagonal 
term, which in this case becomes flrr = ^rs\s=r- Since 
the matrix e^s is in general asymmetric, the total number 
of graphs becomes the product over all directed r, s pairs, 

rs 

Therefore the entropy becomes simply, 

5g= Vn,n,i7 (9) 

which is identical to Eq. [5] except for a factor 1/2 (Note 
that for directed graphs we define e^r as the number of 
edges internal to block r, not twice this value as in the 
undirected case) . Naturally, the same alternative expres- 
sion as in Eq. |6]can be written, as well as the same ap- 
proximation as in Eq. |7] which will be identical except 
for a factor 1/2. 

B. Degree-corrected ensembles with "soft" 
constraints 

Following p2l, we introduce degree variability to the 
blockmodel ensemble defined previously, by imposing an 
expected degree sequence {ni} on all vertices of the graph, 
in addition to their block membership. Thus each indi- 
vidual Ki represents only the average value of the degree 
of vertex i over all samples of the ensemble. Such "soft" 
degree constraints are relatively easy to implement, since 
one needs only to extend the non degree-corrected ver- 
sion, simply by artificially separating vertices with given 
imposed expected degrees into different degree blocks. 
Thus, each existent block is labeled by a pair (r, k), 
where the first value is the block label itself, and the sec- 
ond is the expected degree label. In order for the label 
(r, k) to be meaningful, we need to have intrinsically that 
e(r,K) = LsK' e(r,K),(s,K') = KTifr^^-), such that the average 
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degree of vertices in block (r, k) is exactly k. This results 
in an ensemble with KB blocks, where K is the total 
number of different expected degrees, is the num- 

ber of vertices in block (r, k), and e(r,K),(s,K') number 
of edges between (r, k) and (s,k'). Inserting this block 
structure into Eq. |5] one obtains 



n,._..,H ( ""-'^^'^^'^'^ 1 . (10) 



This ensemble accommodates not only blockmodels with 
arbitrary (expected) degree sequences, but also with arbi- 
trary degree correlations, since it is defined as a function 
of the full matrix e(r,K).(s,K') (It is therefore a general- 
ization of the ensemble defined in 112 )• However, it is 
often more useful to consider the less-constrained ensem- 
ble where one restricts only the total number of edges 
between blocks, irrespective of their expected degrees. 



2^ e(^,«,), 



(11) 



This can be obtained by maximizing the entropy Sgs, 
subject to this constraint. Carrying out this maximiza- 
tion, one arrives at the following nonlinear system, 

exp(Ars + A^r/t + MsK') + 1 



') 



') 



(13) 
(14) 



which must be solved for {e(j.,K).(s,K'), A^s, /^rfi}, where 
{Ar-s} and {(J-tk} are Lagrange multipliers which impose 
the necessary constraints, described by Eqs. [13] and [14] 
respectively. Unfortunately, this system admits no gen- 
eral closed-form solution. However, if one makes the as- 
sumption that exp(Ars + (J-tk + Msk') ^ 1j oi^^ obtains 
the approximate solution. 



(15) 



This is often called the "sparse" or "classical" limit |29j . 
and corresponds to the limit where intrinsic degree cor- 
relations between any two blocks r and s can be ne- 
glected |3S|- Eq- [15] is intuitively what one expects for 
uncorrelated degree-corrected blockmodels: The number 
of edges between (r, k) and (s, k') is proportional to the 
number of edges between the two blocks e^s and the de- 
gree values themselves, kk' . Including this in Eq. 10 and 
using Eq. [4] one obtains. 



1 00 _^ 



^ l{l + 1) ye^e 



(16) 



where iV^ = J2r ''^ir,K) is the total number of vertices 
with expected degree k, and (fc')^ = J2ier '^l/^r is the 
l-th moment of the expected degree sequence of vertices 
in block r. It is interesting to compare this expression 
with the entropy 5*^ for the non-degree corrected ensem- 
ble, EqJ6] The importance of the terms in the last sum 
of Eq. |16[will depend strongly on the properties of the 
expected degree sequence {ni}. Irrespective of its aver- 
age value, if the higher moments of a given block 
r are large, so will be their contribution to the entropy. 
Therefore these terms cannot be neglected a priori for 
all expected degree sequences, regardless of the values of 
the first moments (k)^. Only if one makes the (relatively 
strong) assumption that. 
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for any I > 0, then Eq. [T6]can be rewritten as. 



,ln 



(17) 



(18) 



The last term of Eq. [18] is compatible with the expres- 
sion for the log- likelihood derived in [12j, for the degree- 
corrected ensemble. It is interesting to note that, in this 
limit, the block partition of the network and the expected 
degree sequence contribute to independent terms of the 
entropy. This means that the expected degrees can be 
distributed in any way among the vertices of all blocks, 
without any entropic cost, as long as the expected degree 
distribution is always the same. Furthermore, as shown 
in [12 , the last term of Eq.[T8]can also be rewritten as the 
KuUback-Leibler divergence between the actual and ex- 
pected distributions of block assignments at the opposing 
ends randomly chosen edges, similarly to the non degree- 
corrected blockmodels. The main difference now is that 
the expected distribution is expressed in terms of the 
total number of half-edges e,- leaving block r, instead of 
the block size rir- Equivalently, the last term corresponds 
(after slight modifications) to the mutual information of 
block memberships at the end of randomly chosen edges. 

A typical situation where Eq. [17] holds is when the 
expected degree sequence is such that the higher mo- 
ments are related to the first moment as (k')^ ^ 0((k)^). 
This is the case, for instance, of expected degrees dis- 
tributed according to a Poisson. In this situation, the 
left-hand side of Eq. 17 can be written as el']:^ / {nrnsY 
and thus Eq. 
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holds when e^^/nrUs <^ e^^, which is of- 
ten the case for sparse graphs, as discussed before for the 
non degree-corrected blockmodels. On the other hand, if 
the expected degree distributions are broad enough, the 
higher moments can be such that their contributions to 
the last term cannot be neglected, even for sparse graphs. 
One particularly problematic example are degree distri- 
butions which follow a power law, n(^r,K,) oc Strictly 
speaking, for these distributions all higher moments di- 
verge, (k')^ 00, for / > 7 — 1. Of course, this di- 
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vergence, in itself, is inconsistent with the intrinsic con- 
straints of simple graph ensembles, since it would mean 
that there are expected degrees Ki in the sequence which 
are larger than the network size, or otherwise incompati- 
ble with the desired block structure. In order to compute 
the moments correctly, one would need to consider more 
detailed distributions, e.g. with structural cut-offs which 
depend on the network size, or the sizes of the blocks |46j . 
Nevertheless, it is clear that in such situations one would 
not be able to neglect the entropy terms associated with 
the higher moments, since they can, in principle, be ar- 
bitrarily large. 

Note that certain choices of expected degree sequences 
are fundamentally incompatible with E q. |15[ and will 
cause Eq. [16] to diverge. If one inserts Eq. |15|into Eq. [TO] 
the term inside the sum becomes H (crsHK' / erCs). Since 
the binary entropy function H{x) is only defined for ar- 
guments in the range < a; < 1, then Eq. [18] will only 
converge if the following holds. 



KK < 



(19) 



for all K, k' belonging to blocks r and s, respectively. 
If Eq. [19] is not fulfilled, then Eq. [15] cannot be used 
as an approximation for the solution of the system in 
Eqs. [12] to [14] and consequently Eq. [TB] becomes invalid. 
Note that even if Eq. [T9] is strictly fulfilled, it may also 
be the case that Eq. |15| is a bad approximation, which 
means there will be strong intrinsic inter-block dissor- 
tative degree correlations |47l H5] . A sufficient condi- 
tion for the applicability of Eq. [16] would therefore be 
kk' <C eres/crs, for all k, k' belonging to blocks r and 
s, respectively. However, it is important to emphasize 
that even if Eq. [15] is assumed to be a good approxima- 
tion, it only means that the intrinsic degree correlations 
between any given block pair r, s can be neglected, but 
the entropic cost of connecting to a block with a broad 
degree distribution is still refiected in the last term of 
Eq. |16[ This captures one important entropic effect of 
broad distributions, which can be important, e.g. in in- 
ferring block structures from empirical data, as will be 
shown in Sec. I VI I 



1. Directed graphs 

The directed degree-corrected variants can be analysed 
in analogous fashion, by separating vertices into blocks 
depending on their expected in- and out-degrees, leading 
to block labels given by (r, ), which are included 

directly into Eq. [9] ab ove, which leads to an expression 
equivalent to Eq. |lO| which is omitted here for brevity. 
The "classical" limit can also be taken, which results in 
the expression. 



which if inserted into the degree-corrected entropy ex- 
pression leads to, 

Sgsu - E K+ In - ^ N^- In k~ 

- ers In ( 

rs \ f a 



(21) 



The same caveats as in the undirected case regarding 
the suitability of Eq. [20] and consequently the validity of 
Eq.[2T] apply. 



IV. MULTIGRAPH ENSEMBLES 

We now consider the situation where multiple edges 
between the same vertex pair are allowed. The total 
number of different edge choices between blocks r and 
s now becomes, 



(22) 



where {(j^ = {'^^Z~^) total number of m- 

combinations with repetition from a set of size N . Like 
for simple graphs, total number of graphs is given by the 
total number of vertex pairings between all blocks, 



r>s 

which leads to the entropy, 

Srn = ^ '^{rirUs + ers)H 



(23) 



(24) 



where H{x) is the binary entropy function (Eq. |3|, as 
before. If we consider the more usual case when Crs < 
UrTis, we can expand this expression as. 



2^ yrirnj 



^-^ l{l + 1) yn^ri 



(25) 



This is very similar to Eq. [6] for the simple graph ensem- 
ble, with the only difference being the alternating sign in 
the last term. In the sparse limit, the last term can also 
be dropped, which leads to. 



Sm ^ E — - ^ ers In 



(26) 



(20) 



In this limit, the entropy is identical to the simple graph 
ensemble, since the probability of observing multiple 
edges vanishes. 
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2. Directed 



Like for the simple graph case, the entropy for directed 
multigraphs can be obtained with only small modifica- 
tions. The number of edge choices Jl^s is given exactly 
as in Eq. |22j the only difference being that one no longer 
needs to differentiate the diagonal term, which in this 
case becomes ^„ = '^rs\s=r- Since the matrix e^s is in 
general asymmetric, the total number of graphs becomes 
the product over all directed r, s pairs. 



rs 

Therefore the entropy becomes simply. 



Sg = ^^(n^ris + ers)H 



(27) 



(28) 



which is identical to Eq. 24 except for a factor 1 /2 (Note 
that for directed graphs we define €„• as the number of 
edges internal to block r, not twice this value as in the 
undirected case). Again, the same alternative expression 
as in Eq. [25] can be written, as well as the same approx- 
imation as in Eq. 26 which will be identical except for a 
factor 1/2. 



A. Degree-corrected ensembles with "soft" 
constraints 

We proceed again analogously to the simple graph case, 
and impose that each block is labeled by a pair (r, n) . 
where the first value is the block label itself, and the 
second is expected the degree block. Using this labeling 
we can write the full entropy from Eq. [24] as, 

Sms = 2 X/ ("(n,K)"(s,K') + e(r,K),(s,K'))^ 
tksk' 

h( ^Jr^^f^ V (29) 

\"(r,K)"(s,K') + e(r,K),(s,K') / 

Like for the simple graph case, this is a general ensem- 
ble which allows for arbitrary degree correlations. The 
"uncorrelated" ensemble is obtained by imposing the con- 
straint given by Eq. and maximizing Sms , which leads 
to the following nonlinear system. 



-'(r,K),(s,/^') 



exp(Ars + IJ-TK + MsK') - 1 



^rs — / ^ ^(r,K),(s,K') 

l^n'(r,K) = ^ ] 6(r,K),(s,K') 
sk' 



(30) 
(31) 
(32) 



which must be solved for {e(^r,K).{s,K'), ^rs, fJ-rn}- where 
{A^s} and {fj-rn} are Lagrange multipliers which impose 
the necessary constraints. Like for the simple graph case. 



this system does not have a closed form solution, but one 
can consider the same "classical" limit, exp(A,s -I- -I- 



I^sk') ^ 1, which leads to Eq. 15 Inserting it in Eq. 29 
and using the series expansion given by Eq.]4] the entropy 
can be written as, 



Smsu = E N^K In K - ^ ^ Crs In 



(33) 

Again, the difference from the simple graph ensemble is 
only the alternating sign in the last term. If one takes 
the sparse limit, the above equation is approximated by 
Eq. |18| since in this case both ensembles become equiv- 
alent. 



Directed 



Directed multigraphs can be analysed in the same way, 
by using block labels given by (r, k^, k+), which are in- 
cluded into Eq. [28] a bove, which leads to an expression 
equivalent to Eq. |29| which is omitted here for brevity. 
The "classical" limit can also be taken, which results in 
Eq. |20[ as for simple graphs. Inserting it into the degree- 
corrected entropy expression leads finally to, 

Smsu ~ - ^ K+ In K+ - ^ N^- In k~ 

- Ers In ( 

rs \ f a 

°° ('_iV+i / p \ '+1 



(34) 



which is once again similar to the simple graph ensemble, 
except for the alternating sign in the last term. The 
same caveats as in the simple graph case regarding the 
suitability of Eq. ]20] and consequently the validity of 
Eq.m apply. 



V. DEGREE-CORRECTED ENSEMBLES WITH 
"HARD" CONSTRAINTS 

For the case of "hard" degree constraints we cannot 
easily adapt any of the counting schemes used so far. In 
fact, for the simpler case of a single block {B — 1), which 
is the ensemble of random graphs with a prescribed de- 
gree sequence |24l 1551 - 1111 , there is no known asymp- 
totic expression for the entropy which is universally valid. 
Even the simpler asymptotic counting of graphs with an 
uniform degree sequence (fc.; = k for all i) is an open 
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problem in combinatorics [H]. All known expressions 
are obtained by imposing restrictions on the largest de- 
gree of the sequence such that ki <C N, where 
N is the number of vertices in the graph [50]. Here we 
make similar assumptions, and obtain expressions which 
are valid only for such sparse limits, in contrast to the 
other expressions calculated so far. The approach we will 
take is to start with the ensemble of configurations |51j . 
which contains all possible half-edge pairings obeying a 
degree sequence. Each configuration (i.e. a specific pair- 
ing of half-edges) corresponds to either a simple graph or 
a multigraph, but any given simple graph or multigraph 
will correspond to more than one configuration. Knowing 
the total number of configurations fij;^ between blocks r 
and s, the total number firs of edge choices correspond- 
ing to distinct graphs can then be written as. 



^ ^rs — ^ ^rs^'i^s ; 



(35) 



where is the fraction of configurations which corre- 
spond to distinct simple graphs or multigraphs. 

Although counting configurations and graphs are dif- 
ferent, and so will be the corresponding entropies, there 
are some stochastic processes and algorithms which gen- 
erate fully random configurations, instead of graphs. Per- 
haps the most well known example is the configurational 
model [52 , 53J , which is the ensemble of all configurations 
which obey a prescribed degree sequence. A sample from 
this ensemble can be obtained with a simple algorithm 
which randomly matches half-edges [52] • If one rejects 
multigraphs which are generated by this algorithm, one 
has a (possibly very inneficient) method of generating 
random graphs with a prescribed degree sequence, since 
each simple graph will be generated by the same num- 
ber of configurations, which is given by Yii kil- However, 
the same is not true if one attempts to generate multi- 
graphs, since they will not be equiprobable [54[, as will 
be discussed in Sec. IV CI below. 

A central aspect of computing S^s is the evaluation of 
the probability of obtaining multiple edges. If we isolate 
a given pair i,j of vertices, which belong to block r and s, 
respectively, we can write the probability of there being 
m parallel edges between them as. 



\mJ V fc'- 



(36) 



which is the hypergeometric distribution, since each half- 
edge can only be paired once (i.e. there can be no replace- 
ment of half-edges) . In the above expression, the degrees 
k[ and kj refiect the number of edges in each vertex which 
lie between blocks r and s, which can be smaller than 
the total degrees, ki and kj. In general, this expression 
is not valid independently for all pairs i,j, since the pair- 
ing of two half-edges automatically restricts the options 
available for other half-edges belonging to different ver- 
tex pairs. However, in the limit where the largest degrees 
in each block are much smaller than the total number of 



vertices in the same blocks, we can neglect such inter- 
action between different placements, since the number 
of available options is always approximately the same. 
This is not a rigorous assumption, but it is known to 
produce results which are compatible with more rigor- 
ous (and laborious) analysis p4|,l40|. In the following we 
compute the number of configurations and the approxi- 
mation of Sj-s for simple graphs and multigraphs, using 
this assumption. 



A. Configurations 

For a given block r, the number of different half-edge 
pairings which obey the desired block structure deter- 
mined by is given by. 



(37) 



The above counting only considers to which block a given 
half-edge is connected, not specific half-edges. The exact 
number of different pairings between two blocks is then 
given simply by. 



(e„ - 1)! 



(38) 



Note that the above counting differentiates between per- 
mutations of the out-neighbours of the same vertex, 
which are all equivalent (i.e. correspond to the same 
graph) . This can be corrected in the full number of pair- 



ings. 



Ur^rUs>r^r 



(39) 



where the denominator discounts all equivalent permu- 
tations of out-neighbours. Note that the above count- 
ing still does not account for the total number of simple 
graphs, since multiedges are still possible. Multigraphs 
are also not counted correctly, since for each occurrence 
of m multiedges between a given vertex pair, the number 
of different edge pairings which are equivalent decreases 
by a factor ml |19[[54]. These corrections are going to be 
considered in the next sections. Taking the logarithm of 
Eq. |39| and using Stirling's approximation, one obtains. 



S, = -E-Y,Nk\nkl-^Yl ^rs In 



(40) 



It is interesting to compare this expression with the one 
obtained for soft degree-constraints in the sparse limit, 
Eq. |18| The entropy difference between the two ensem- 
bles depends only on the degree sequence, 

Sgsu - 5, = 2S + ^ iVfc In fc! - ^ Ar,K In (41) 



This difference disappears if the individual degrees are 
large enough so that Stirling's approximation can be 
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used, i.e. Infc! w kink — k, and we have that ki = Hi 
for all vertices. Thus, in the sparse limit, but with suf- 
ficiently large degrees, the simple graph and multigraph 
ensembles with soft constraints, and the configuration 
ensemble with hard constraints become equivalent [55\ . 



1. Directed configurations 

When counting directed configurations, we no longer 
need to discriminate the diagonal terms of the fJ^s ma- 
trix, which become fJ^r = e^r!- Since the matrix e^s is in 
general asymmetric, the total number of configurations 
becomes, 



^r rirs 



(42) 



which includes the correction for the permutations of in- 
and out-degrees. This leads to the entropy. 



Scd = -E-Y. In - ^ Nk- In k' 

k- 

-Y.^rs\n{^]. (43) 



k+ 



B. Simple graphs 

Following f53] , if we proceed with the assumption out- 
lined above that Pj"(m) are independent probabilities 
of there being m edges between vertices i and j, we 
can write the probability Sj.^ that a configuration cor- 
responds to a simple graph as, 

2..«n[^7(o) + ^7(i)] (44) 

ij 

S„ « \\\P:[{fi) + i^7(l)] X \{p:^{^, (45) 

i>j i 

where the product is taken over all vertex pairs be- 
longing to blocks r and s, respectively, and P^iii) is the 
probability of there being no self-loops attached to ver- 
tex i, belonging to block r. This is given by computing 
the probability that all ki half-edge placements are not 
self-loops. 



2k' 



1 



{err - Ky.jerr - 2k[ - ly.l 



2k' 



(46) 
(47) 



where we also make the assumption that these probabil- 
ities are independent for all vertices. We proceed by ap- 
plying Stirling's approximation up to logarithmic terms. 



i.e. In a; ! w (a; — 1/2) In a; — a; , and expanding the proba- 
bilities in powers of l/e^s, leading to. 



and, 



lnP;z(*) 



0{l/el 



0{l/elr). 



(48) 



(49) 



As mentioned before, the degrees k'^ and k'j in the ex- 
pression above are number of edges in each vertex which 
lie between blocks r and s. Since the total degrees ki 
and kj are assumed to be much smaller than the number 
of half-edges leaving each block, we can consider fc^, for 
i G 7', to be a binomially distributed random number in 
the range [0, ki], with a probability ers/e-r- We can there- 
fore write {k[) = kiers/er, and (j^i^'^ = hih — l)efs/er, 
where the average is taken over all vertices with the same 
degree and in the same block r. Putting it all together 
we obtain an expression for the entropy which reads. 



Sghu w - ^ A^fe In fc! - ^ ^ Crs In 



4 ^ e^e^ 

rs ' ^ 

-^E^((fc\-W.)' (50) 

r 

where {k)^ = Y.ier^il'^r and (fc^)^^= Y.,&H/^r- 

If we make B — \, the ensemble is equivalent to fully 
random graphs with an imposed degree sequence. In this 
case, Eq. [50] becomes identical to the known expression 
derived in (40|. for the limit ki <^ N (which is known to 
be vahd for max({fci}) o{^fN) [56]). This expression 
is also compatible with the one later derived in |M] (ex- 
cept for a trivial constant). Therefore we have obtained 
an expression which is fully consistent with the known 
special case without block structure. 

It is interesting to compare Eq. [50] with the equiva- 
lent expression for the case with soft degree constraints, 
Eq. [16] Eq. [50] is less complete than Eq. [16] since it con- 
tains terms of order comparable only to the first term of 
the sum in Eq. |16[ Furthermore, in Eq. [50]the last terms 
involve the difference (fc^) — (fc)^, instead of the second 



moment (fc^) 
Eq 



as in Eq. 



50 



16 



[It is worth noting that 



passes the "sanity check" of making (k 



2\ 



which is only possible for the uniform degree sequence 
ki = 1, in which case no parallel edges are possible, and 
the entropy becomes identical to the ensemble of config- 
urations, Eq. 40 ] Thus we can conclude that the two en- 



sembles (with soft and hard constraints) are only equiv- 
alent in the sufficiently sparse case when the differences 
in the remaining higher order terms in Eq. [16] can be ne- 
glected, and when the degrees are large enough (or the 
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distributions broad enough) so that (fc^)^, ^ (fc)r' ^^"^ 
the self-loop term can also be discarded. 



1. Directed graphs 

For directed graphs one can proceed stepwise with an 
analogous calculation, with the only difference that the 
probability of self-loops in this case involves the in- and 
out-degree of the same vertex, and can be obtained by 
the hyper geometric distribution. 



k7 



exp 



(51) 
(52) 



The analogous expression to Eq. [50] then becomes, 

m-f)s-{k-)s)-Y."^{^^^-)r- (53) 



Similarly to Eq. |50[ if we make i? = 1, we recover 
the known expression derived in [ID] for the number of 
directed simple graphs with imposed in/out-degree se- 
quence, obtained for the limit k~^~^ <^ N. 



C. Multigraphs 

All configurations which are counted in Eq. |40] are 
multigraphs, but not all multigraphs are counted the 
same number of times. More precisely, for each vertex 
pair of a given graph with m edges between them, the 
number of configurations which generate this graph is 
smaller by a factor of m!, compared to a simple graph of 
the same ensemble |19ll54j . This means that the denomi- 
nator of Eq. [39] overcounts the number of equivalent con- 
figurations for graphs with multiedges. Hence, similarly 
to the simple graph case, we can calculate the correction 



n 
n 

i>3 



I ij ' 



m ) ■ ■ X 



n((2™)!!)i, 



(54) 
(55) 



where {m\)^j = J2m=o''^^--^if(''^) average correc- 

tion factor, and {{2m)U)l = X]m=o(-^"^)''-^i'('^) accounts 
for the parallel self-loops, with P[{m) being the probabil- 
ity of observing m parallel self-loops on vertex z, belong- 
ing to block r. It is easy to see that P[{m = 0) = P^;, 
" P[im = 1) - (^•)/e„ + 0(1/ e^„) 
^ 0(l/e™). We proceed by apply- 



given by Eq. 
and P[{m > 



47 



1 



In x! « (x — 1/2) In a; — a;, and expanding the sum in pow- 
ers of l/crs, which leads to. 



In (ml) 



ln((2TO)!!) 



2 /fc. 



1 ^k' 



n+0{l/el), (56) 



(57) 



+ 0(1/6^,). 



Using that (fc,-) = kiBrs/sr, and (^kf'^ — ki{ki — l)efg/ef, 

and putting it all together we obtain an expression for the 
entropy which reads. 



'.hu 



^E-Y^N,\nk\-We,.M('^) 

k rs Ve-res/ 

rs ^ ^ 



where (k)^ = Y^ier ^^I'^r and (/c^)^ = Y^ier kf/ur. This 
expression is very similar to the one obtained for the 
simple graph ensemble, except for the sign of the last 
two terms. 

Again if we make _B = 1, the ensemble is equivalent 
to fully random multigraphs with an imposed degree se- 
quence. In this case, Eq. [58] becomes identical to the 
known expression derived in [57 , for the limit ki <^ N . It 
also corresponds to the expression derived in |40| . which 
does not include the last term, since in that work parallel 
self-edges are effectively counted as contributing degree 
one to a vertex, instead of two as is more typical. 



1. Directed multigraphs 

For directed graphs one can proceed stepwise with an 
analogous calculation, which leads to. 



ghu 



(e+)2(ej)2 



(59) 



ing Stirling's approximation up to logarithmic terms, i.e. 



Note that in this case the calculation of the correction 
term for self-loops is no different than other parallel 
edges, and hence there is no self-loop term as in Eq. |58[ 
Like before, if we make _B = 1, we recover the known 
expression derived in [39_ for the number of multigraphs 
with imposed in/out-degree sequence, obtained for the 
limit k-/+ < N. 



VI. BLOCKMODEL DETECTION 

The central problem which motivated large part of the 
existing literature on stochastic blockmodels is the detec- 
tion of the most likely ensemble which generated a given 
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network realization. Solving this problem allows one to 
infer latent block structures in empirical data, provid- 
ing a meaningful way of grouping vertices in equivalence 
classes. Blockmodel detection stands in contrast to the 
usual approach of community detection [7^ , which focuses 
almost solely on specific block structures where nodes are 
connected in dense groups, which are sparsely connected 
to each other (this corresponds to the special case of a 
stochastic blockmodel where the diagonal elements of the 
matrix e^s are the largest). 

As mentioned in the introduction, the stochastic block- 
model entropy can be used directly as a log-likelihood 
function £ = InV — —S, if one assumes that each net- 
work realization in the ensemble occurs with the same 
probability V = l/fi. Maximizing this log- likelihood can 
be used as a well-justified method of inferring the most 
likely blockmodel which generated a given network re- 
alization [TOl [12] . Stochastic blockmodels belong to the 
family of exponential random graphs |58ll55] . and as such 
display the asymptotic property of consistently generat- 
ing networks from which the original model can be in- 
ferred, if the networks are large enough |TUl I60| . 

In jT^ a log-likelihood function for the degree- 
corrected stochastic blockmodel ensemble was derived, 
in the limit where the network is sufficiently sparse. As 



we will show, using entropy expressions derived here, we 
obtain a log-likelihood function which generalizes the ex- 
pression obtained in |12| . which is recovered when one 
assumes not only that the graph is sufficiently sparse, 
but also that the degree distribution is not very broad. 
Since this specific situation has been covered in detail 
in [12 , we focus here on a simple, representative ex- 
ample where the degree distribution is broad enough so 
that if this limit is assumed, it leads to misleading re- 
sults. Network topologies which exhibit entropic effects 
due to broad degree distributions are often found in real 
systems, of which perhaps the best-known is the inter- 
net |47l H5] . We also consider the situation where there 
are "extrinsic" degree correlations, in addition to the la- 
tent block structure. The same methods can be used in 
a straightforward way for multigraph or directed ensem- 
bles, using the corresponding entropy expressions derived 
in the previous sections. 

Given a network realization, the task of blockmodel 
inference consists in finding a block partition {bi} G 
[0,B — 1]^ of the vertices, which maximizes the log- 
likelihood function C Considering, for instance, the 
degree-corrected blockmodel ensemble with "soft" degree 
constraints |61j . using Eq. 16 one can write the following 
log-likelihood function. 



'C(G|{5a) = ^e,, In 



J2 



l{l + 1) \ere. 



(60) 



where the terms not depending on the block partition 
{bi} were dropped, and L is a parameter which controls 
how many terms in the sum are considered. Using this 
function, we encompass the following cases: 

1. For L = the objective function derived in fTT is 
recovered, which corresponds to the situation where 
the second term can be neglected entirely. 

2. For L > 0, higher order corrections are considered, 
which may be relevant if the higher moments of the 
degree sequence on each block are sufficiently large. 

The general approach used here is to maximize £, as 
given by Eq. |60| by starting with a random block par- 
tition, and changing the block membership of a given 
vertex to the value for which C is maximal, and proceed- 
ing in the same way repeatedly for all vertices, until no 
further improvement is possible. The algorithmic com- 
plexity of updating the membership of a single vertex 
in a such a "greedy" manner is 0{B{B{L -I- 1) -I- (fc))), 
which does not depend on the system size, and therefore 
is efficient as long as B is not too large. However this al- 
gorithm will often get stuck in a local maximum, so one 
has to start over from a different random partition, and 
compare the maximum obtained. Repeating this a few 
times is often enough to find the optimal solution [62^ . 



In the following we will consider a representative ex- 
ample where the terms for L > are indeed relevant 
and result in different block partitions, when compared 
to L = 0. Instead of testing the general approach in 
difficult cases, we deliberately choose a very simple sce- 
nario, where the block structure is very well defined, in 
order to make the block identification as easy as possi- 
ble. However, as we will see, even in these rather extreme 
cases, not properly accounting for the correct entropic ef- 
fects will lead to spurious results, which is the case with 
L = 0. 



A. Intrinsic degree correlations 

In order to illustrate the use of the objective function 
given by Eq. [60] we will consider a simple diagonal block- 
model defined as, 

Srs CC wSrs + i'^ - w){l - 5rs), (61) 

where w E [0, 1] is free parameter, and all blocks have 
equal size. Furthermore, independently of the block 
membership, the degrees will be distributed according 
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FIG. 2. (Color online) Average nearest neighbour degree 
(^)nn(^)' ^ ^ function of the degree of the originating ver- 
tex k, for the model with intrinsic (left) and extrinsic (right) 
degree correlations. 



to a Zipf distribution within a certain range, 



Pk oc 



min 1 '^max I 



k , if fc G [fcmin 1 k 

0, otherwise 



(62) 



This choice allows for a precise control of how broad the 
distribution is. Here we will consider a typical sample 
from this ensemble, with TV = 10'^ vertices, B = 4, a, 
strong block structure with w = 0.99, and degree distri- 
bution with 7 = 1.1 and [fcmin, fcmax] = [30, 200]. As men- 
tioned before, this strong block structure is deliberately 
chosen to make the detection task more straightforward. 
The sample was generated using the Metropolis-Hastings 
algorithm [63, 64J , by starting with some network with a 
degree sequence sampled from the desired distribution, 
and the block labels distributed randomly among the 
nodes. At each step, the end point of two randomly cho- 
sen edges are swapped, such that the degree sequence 
is preserved. The probability difference Ap = p' — p is 
computed, where p oc "Yliij ■^i3^bi,bj is the probability of 
observing the given network before the move, and p' is 
the same probability after the move. If AP is positive the 
move is accepted, otherwise it is rejected with probability 
1 —p'jp. Additionally, a move is always rejected if it gen- 
erates a parallel edge or a self- loop. If the probabilities 
are nonzero, this defines a Markov chain which fulfills de- 
tailed balance, and which is known to be ergodic |65fl67| . 
and thus generates samples with the correct probability 
after equilibrium is reached |55] . 

As can be seen in Fig. |2] the degree distribution is 
broad enough to cause intrinsic dissortative degree cor- 
relations in the generated sample. In the following, the 
same single sample from the ensemble will be used, to 
mimic the situation of empirically obtained data. How- 
ever, we have repeated the analysis for different samples 
from the ensemble, and found always very similar results. 

It is usually the case that one does not know a priori 
which value of B is the most appropriate. Hence, one 
must obtain the best partitions for several B values, and 
choose the one with the largest value of £. However, the 
values of L will always increase monotonically with _B, 
since the number of suitable models will become larger, 
while the data remains the same, culminating in the ex- 
treme situation where each vertex will belong to its own 



block, and the inferred e^s parameters will be given di- 
rectly by the adjacency matrix One can estimate 
how £ should increase with B by exploiting the fact the 
first term in Eq. [60] has the same functional form as the 
mutual information of two random variables a;, y. 



xy 



In 



Pxy 
PxPy 



(63) 



where p^y is the joint distribution of both variables. It 
is a known fact that the mutual information calculated 
from empirical distributions suffers from an upwards sys- 
tematic bias which disappears only as the number of 
samples goes to infinity |70j . Assuming the fiuctua- 
tions of the counts in each bin of the distribution are 
independent, one can calculate this bias analytically as 
AI{x,y) = {X - 1){Y - l)/2N, + 0{llNf), where X 
and Y are the number of possible values of the x and y 
variables, respectively, and Ns is the number of empirical 
samples |70| . Using this information, one can obtain an 
estimation for the dependence of £ on _B, 



C* 



C-{B-l) 



(64) 



where C* is the expected "true" value of the log- 
likelihood, if the sample size goes to infinity [7T]. This 
can be used to roughly differentiate between situations 
where the log-likelihood is increasing due to new block 
structures which are being discovered, and when it is only 
due to an artifact of the limited data. 

In Fig. [3] are shown the values of C for different L, 
for the same sample of the ensemble above. The likeli- 
hood increases monotonically until B = A, after which 
it does not increase significantly. The values of C are 
significantly different for different L (which shows that 
the higher order terms in Eq. [60] should indeed not be 
neglected), but all curves indicate _B = 4 as being the 
"true" partition size, which is indeed correct. However, 
a closer inspection of the resulting partitions reveals im- 
portant differences. In Fig. ]4]are shown some of the ob- 
tained partitions for different values of L and B. For 
_B = 4, all values of L result in the same partition, which 
corresponds exactly to the correct partition. For larger 
values of B, however, the obtained partitions differ a lot 
more than one would guess by looking at the values of C 
alone. For B = 8 and L = one sees a clear division into 
8 blocks, which strongly separates vertices of different de- 
grees. This could easily be mistaken for a true partition, 
despite the fact that it is nothing more than an entropic 
artifact of the broad degree distribution. Indeed if one 
increases L, the optimal partition becomes eventually a 
random sub-partition of the correct B = A structure. In 
this particular example, L = 2 is enough to obtain the 
correct result, and the higher values result in the same 
partition, with only negligible differences. 

The correlation of the block partition with the degree 
sequence can be computed more precisely by using the 
mutual information I{b,k) (Eq. 63 I, between the block 
labels and the degrees. Since we want to compare parti- 
tions obtained for different values of B, and changing B 
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will invariably change I(b, k), we use instead the average 
normalized mutual information, defined here as, 



FIG. 3. (Color online) Left: Optimized log-likelihood jC 



(Eq. 60 1 as a function of B, for different values of L, for 



the same sample from the ensemble with intrinsic degree cor- 
relations. Right: Average normalized mutual information 



(Eq. 65 1 between the degree sequence and the block parti- 
tion, as a function of B, for different values of L. 




FIG. 4. (Color online) Obtained block partitions for different 
values of B and L, for the same sample of the ensemble with 
intrinsic degree correlations. The colors indicate the parti- 
tion, and the size of the vertices is proportional to the degree. 
Nodes of high degree are pushed towards the center of the 
layout. Note that for B = 8 and L € [0, 1], the nodes of high 
degree are segregated into separate blocks. 



/ Iib,k) 
\/(r,fc) 



(65) 



where /(r, k) is the mutual information of the degree se- 
quence and a random block partition {r.^}, obtained by 
shuffling the block labels {bi}. The average is taken over 
several independent realizations of {ri}. If the block 
partition is uncorrelated with the degree sequence, one 
should have that I{b, k) is close to one, since there are no 
intrinsic correlations between the correct partition and 
the degrees. The values of /(6, k) are shown in Fig [s] 
One sees clearly that the results for lower values of L 
are significantly correlated with the degree sequence, and 
that for L > 2 the correlation essentially vanishes. 

The reason why the log-likelihood with L = delivers 
spurious block structures is intimately related to the fact 
that the degree distribution is this case is broad. This 
causes the remaining terms of Eq. [60] to become relevant, 
as they represent the entropic cost of an edge leading 
to a block with a broader degree distribution. On the 
other hand, the same entropic cost is responsible for the 
dissortative degree correlations seen in Fig. |2] This is 
in fact inconsistent with the assumption made when de- 
riving Eq. |16| namely Eq. |15| which says that there are 
no such degree correlations. This is indeed true, and it 
means that Eq. |60| even for L — ^ c», is still an approxi- 
mation which neglects certain entropic effects. However, 
as mentioned previously, it still captures a large portion 
of the entropic cost of placing an edge incident to a block 
with a broad degree sequence, and this is the reason why 
it can be used to infer the correct block structure in the 
example shown. The same performance should be ex- 
pected in situations where the intrinsic degree correla- 
tions are present, but not "too strong" as to require bet- 
ter approximations. Indeed, as was discussed previously 



following the derivation of Eq. 16 for networks with very 
large degrees it may be that Eq. |60] diverges, for suffi- 
ciently large L. However, this situation can be managed 
adequately. In Sec. |IIIB| we computed the entropy for 
the ensemble with soft degree constraints and arbitrary 
degree correlations, given in Eq. [TO] This expression is 
exact, and can be used as a log- likelihood in the extreme 
situations where Eq. [60] is not a good approximation. 
The downside is that one needs to infer much more pa- 
rameters, since the model is defined by the full matrix 
e(r.fc),(s,fc)j which makes the maximization of C less com- 
putationally efficient, and may result in overfitting. A 
more efficient method will be described in the next sec- 
tion, which consists in separating vertices in groups of 
similar degree, and using this auxiliary partition to in- 
fer the actual block structure. This can be done in way 
which allows one to control how much information needs 
to be inferred, such that the degree correlations (intrinsic 
or otherwise) have been sufficiently accounted for. 
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FIG. 5. (Color online) Inferred block partitions for the model 
with extrinsic degree correlations, obtained my maximizing 
the log-likelihood £, given by Eg. |60[ for different values of B 
and L = 2. 




Auxiliary partition {di}, with Inferred partition {bi}, with 
D = 8. B = 8. 

FIG. 6. (Color online) Auxiliary and inferred block partitions 
for a sample of the ensemble with intrinsic degree correlations. 



B. Extrinsic degree correlations 

We consider now the case where there are arbitrary 
extrinsic degree correlations (although the method de- 
scribed here also works well in situations with strong in- 
trinsic degree correlations which are not well captured by 
Eq. |60| . As an example, we will use a modified version 
of the blockmodel ensemble used in the previous section, 
which includes assortative degree correlations, defined as 



e(r,fc),(s,fe') oc 



l + \k-k' 



(66) 



where e^s is given by Eq. |61[ Similarly to the previ- 
ous case, we consider a typical sample from this ensem- 
ble, with N = 10'^ vertices, B = A, a, block structure 
with w = 0.99, and degree distribution with 7 = 1.1 and 
[fcminjfcmax] — [30,200]. The degree Correlations obtained 
in this sample is show in Fig. |2] 

If one does not know, or ignores, that there are degree 
correlations present, and attempts to detect the most 
likely block structure using Eq. 



60 one obtains block 



partitions shown in Fig. |5] Due to the high segregation 
of the modules, one indeed finds the correct block par- 
tition for B = 4. but as the value of B is increased, 




FIG. 7. (Color online) Left: Optimized log-likelihood jC 



(Eq. 601 as a function of B, for different values of L, for the 
same sample from the ensemble with extrinsic degree correla- 
tions. The legend "aux." indicates results obtained with the 
auxiliary degree-based partition described in the text. Right: 
Average normalized mutual information (Eq. |65[ ) between the 
degree sequence and the block partition, as a function of B, 
for different values of L, and size of the auxiliary partition D 
(or without it if D is omitted). 



one finds increasingly many "sub-blocks" corresponding 
to groups vertices of different degrees. This is simply a 
manifestation of the degree correlations present in Eq. [M] 
As Fig.[5]shows, the log-likelihood increases steadily with 
larger B values, indicating that the "true" block structure 
has not yet been found. Indeed one would need to make 
B ^ 4K, where K is the number of different degrees in 
the network, to finally capture the complete structure. 
The correct inferred partition in this case would put ver- 
tices of the same degree in their own block, which we 
can label as (r, k). 



In this situation, Eq. 16 becomes no 
longer an approximation, since Eq. |17| w ill also hold ex- 
actly, and it becomes identical to Eq. |10[ which we could 
use instead as a log-likelihood (which effectively removes 
the parameter L). Strictly speaking, Eq. 10 is entirely 
sufficient to detect any block structure with arbitrary 
degree correlations, either intrinsic or extrinsic. In prac- 
tice, however, it is cumbersome to use since it requires 
the inference a large amount of parameters, namely the 
full e(^r,k),{r,k) matrix of size (BK)^ (of which half the el- 
ements are independent parameters), as well as the n(r,fc) 
vector of size BK. The number of different degrees K is 
often significantly large. For the specific example shown 
in Fig. [5] we have K = 168, which results in a parameter 
matrix which is much larger than the number of edges in 
the network. This is an undesired situation, since with 
such a large number of parameters, not only it becomes 
easier to get trapped in local maxima when optimizing C, 
but also it becomes impossible to discern between actual 
features of the inferred model and stochastic fiuctuations 
which are frozen in the network structure. However, it is 
possible to circumvent this problem using the following 
approach. Before attempting to infer the block partition 
{bi}, one constructs an auxiliary partition {di} which re- 
mains fixed throughout the entire process. The auxiliary 
partition separates vertices in D blocks representing de- 
gree bins, so that vertices in the same block have similar 
degrees. Exactly how large should be each degree block, 
and how the bin boundaries should be chosen will depend 
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in general of specific network properties; however a good 
starting point is to separate them into bins such that the 
total number of bins D is as small as possible, while at 
the same time keeping the degree variance within each 
bin also small. Furthermore, one should also avoid hav- 
ing degree bins with very few vertices, since this is more 
likely to lead to artifacts due to lack of statistics. With 
this auxiliary partition in hand, one can proceed to infer 
a block partition {6^} into B blocks, such that the com- 
bined block label of a given vertex i is {hi, di). The log- 
likelihood is computed using Eq. 
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using the full (6, d) 
block labels to differentiate between blocks. If the {di] 
partition is reasonably chosen, the degree correlations 
will be inferred automatically, and from the {hi} par- 
tition it is possible to extract the block structure which 
is independent from degree correlations. Note however 
that after this procedure the hi labels hy themselves do 
not represent a meaningful partition, since any relabel- 
ing of the form (r, d) o {s,d), for the same value of d, 
results in an entirely equivalent block structure. In order 
to obtain a meaningful {hi} partition, it is necessary to 
proceed as follows, 

1. Maximize C using auxiliary the partition, {di}, ob- 
taining the best partition {{hi,di)}. 

2. Swap labels {r,d) •(-> (s, d), within the same auxil- 
iary block d, such that the log-likelihood £, ignor- 
ing the auxiliary partition {di}, is maximized. 

In step 2, the labels are swapped until no further im- 
provement is possible. After step 2 is completed, the 
blockmodel obtained in step 1 remains unchanged, but 
the block labels {hi} now have a clear meaning, since 
they represent the best overall block structure, ignoring 
the auxiliary partition, among the possibilities which are 
equivalent to the inferred block partition. 

In the left of Fig. [6] is shown an example auxiliary par- 
tition, with D = 8, and bin widths chosen so that all 
groups have approximately the same size. On the right is 
shown the inferred {hi} partition with B = 8, using the 
auxiliary partition, after the label swap step described 
above. Notice how the correlations with degree can no 
longer be distinguished visually. Observing how the log- 
likelihood increases with B (see Fig. |7]), the results with 
the auxiliary partition point more convincingly to the 
B = A structure, since it does not increase significantly 
for increasing block numbers. Fig [7] also shows the av- 
erage normalized mutual information between the block 
partitions and the degrees, and indeed the difference be- 
tween the inference with and without the block partition 
is significant. For D ~ 8 one can still measure a residual 



correlation, but by increasing the auxiliary partition to 
D — IQ virtually removes it, which is still significantly 
smaller than the total number of degrees K = 168. 
VII. CONCLUSION 

We have calculated analytical expressions for the en- 
tropy of stochastic blockmodel ensembles, both in its 
traditional and degree-corrected forms. We have con- 
sidered all the fundamental variants of the ensembles, 
including directed and undirected graphs, as well as de- 
gree sequences implemented as soft and hard constraints. 
The expressions derived represent generalizations of the 
known entropies of random graphs with arbitrary degree 
sequence [24, which are easily recovered by set- 

ting the number of blocks to one. 

As a straightforward application of the derived entropy 
functions, we applied them to the task of blockmodel 
inference, given observed data. We showed that this 
method can be used even in situations where there are in- 
trinsic (i.e. with an entropic origin) degree correlations, 
and can be easily adapted to the case with arbitrary ex- 
trinsic degree correlations. This approach represents a 
generalization of the one presented in [12] , which is only 
expected to work well with sparse graphs without very 
broad degree sequences. 

Furthermore, the blockmodel entropy could also be 
used as a more refined method to infer the relevance of 
topological features in empirical networks [T^, and to 
determine the statistical significance of modular network 
partitions [75^75] . 

Beyond the task of block detection, the knowledge of 
the entropy of these ensembles can be used to directly ob- 
tain the equilibrium properties of network systems which 
possess an energy function which depends directly on the 
block structure. Indeed this has been used in [T^ to con- 
struct a simplified model of gene regulatory system, in 
which the robustness can be expressed in terms of the 
block structure, functioning as an energy function. The 
evolutionary process acting on the system was mapped to 
a Gibbs ensemble, where the selective pressure plays the 
role of temperature. The equilibrium properties were ob- 
tained by minimizing the free energy, which was written 
using the blockmodel entropy. This model in particular 
exhibited a topological phase transition at higher values 
of selective pressure, where the network becomes assem- 
bled in a core-periphery structure, which is very similar 
to what is observed in real gene networks. We speculate 
that the blockmodel entropy can be used in the same 
manner to obtain properties of wide variety of adaptive 
networks |T5], for which stochastic blockmodels are ade- 
quate models. 
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